Comparison of the efficiency of data mining methods in predicting type 2 diabetes

Authors

  • Habibollah Esmaily Department of Epidemiology and Biostatistics, Faculty of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
  • Hossein Tireh Department of Epidemiology and Biostatistics, Faculty of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
  • Mohammad Taghi Shakeri Department of Epidemiology and Biostatistics, Faculty of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
  • Razieh Yousefi Department of Epidemiology and Biostatistics, Faculty of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
  • Sadegh Rasoulinezhad Department of Epidemiology and Biostatistics, Faculty of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
Abstract:

Background: Diabetes mellitus as a chronic disease is the most common disease caused by metabolic disorders and it is one of the most important health issues all around the world. Nowadays, data mining methods are applied in different fields of sciences due to data mining methods capability. Therefore, in this study, we compared the efficiency of data mining methods in predicting type 2 diabetes. Methods: In this cross-sectional study, the data of 7,000 participants in the Diabetes Screening Project in Samen, Mashhad City, Iran, were considered in 2016. There were 540 untreated diabetic patients. The Samen Project was included in the routine examinations of diabetes patients like blood glucose, eyes health, nephropathy, and legs health. So, in order to maintain balance, 600 healthy individuals were selected in a proportional volume sampling in this study. Therefore, the total sample size was 1140 people. In this study, people with diabetes aged over 30 years old were enrolled and participants with the previous history of type 2 diabetes, with normal blood glucose due to drug use or other issues at the time of the study, were excluded. Results: All three models (Logistic regression, simple Bayesian and support vector machine models) had the same test accuracy (86%), however, in terms of area under the receiver operating characteristic (ROC) curve (AUC), logistic regression and simple Bayesian models had better performance (AUC=90% against AUC=88%). In the simple Bayesian model and logistic regression, body mass index (BMI) and age variables were the most important variables, while BMI and blood pressure variables were the most important factors in the support vector machine model. Conclusion: According to the results, all three models had the same accuracy. In terms of area under the curve (AUC), logistic and simple Bayes models had better performance than the support vector machine model. Totally all three models had almost the same performance. Based on all three models, BMI was the most important variable.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

Comparison of the Efficiency of Data Mining Algorithms in Predicting the Diagnosis of Diabetes

Background: Diabetes is one of the major health problems in Iran and about 4.6 million adults suffer from this disease. Poor diagnosis of this disease has caused half of this number to be unaware of their disease. In recent years, along with the use of computers in data analysis and storage, the volume and complexity of data has increased dramatically. Methods: In health organizations, data pl...

full text

data mining rules and classification methods in insurance: the case of collision insurance

assigning premium to the insurance contract in iran mostly has based on some old rules have been authorized by government, in such a situation predicting premium by analyzing database and it’s characteristics will be definitely such a big mistake. therefore the most beneficial information one can gathered from these data is the amount of loss happens during one contract to predicting insurance ...

15 صفحه اول

the clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance

با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...

comparison of catalytic activity of heteropoly compounds in the synthesis of bis(indolyl)alkanes.

heteropoly acids (hpa) and their salts have advantages as catalysts which make them both economically and environmentally attractive, strong br?nsted acidity, exhibiting fast reversible multi-electron redox transformations under rather mild conditions, very high solubility in polar solvents, fairly high thermal stability in the solid states, and efficient oxidizing ability, so that they are imp...

15 صفحه اول

on the comparison of keyword and semantic-context methods of learning new vocabulary meaning

the rationale behind the present study is that particular learning strategies produce more effective results when applied together. the present study tried to investigate the efficiency of the semantic-context strategy alone with a technique called, keyword method. to clarify the point, the current study seeked to find answer to the following question: are the keyword and semantic-context metho...

15 صفحه اول

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 77  issue 5

pages  301- 307

publication date 2019-08

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023